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ABSTRACT 

This paper offers examples of the ways in which 
research influences national policies related to assessment and 
program evaluation. It describes an ongoing evaluation being 
conducted by the National Academy of Education (NAE) of the M Trial 
State Assessment," a piece of the National Assessment of Educational 
Progress (NAEP) . At this interim stage, the NAE study has been well 
received by Congress and other audiences because the NAE panel: (1) 
covered the topics requested by Congress; (2) did its homework; (3) 
comported itself impartially; (4) provided clearly written results; 
(5) observed Washington etiquette regarding submission; and (6) 
briefed Congress personally. Any external evaluation group must 
guarantee imparLial research, overcome Congressional distrust of 
researchers, resist pressure from contending parties, move from broad 
legislative objectives to specific researchable problems, and specify 
deadlines and formats for clear reporting, A conclusion is that 
researchers can strongly influence national policy when they have 
evidence for their conclusions; their work deals with problems faced 



they clearly state their conclusions. Appendices 



by policymakers; and 

contain pertinent legislation from the General Education Provisions 
Act, a list of studies completed by the NAE evaluation panel, and 
topics of NAE research. (LMl) 



-> t * it * ?*c it * it it it it it it it it it ii it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it 

* Reproductions supplied by EDRS are the best that can be made * 

from the original document, * 

it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it 



9 

ERLC 



U.t. DEPARTMENT OF EDUCATION 

Ofttce of Educational Q«»arcn«nd impfovemenj "PERMISSION TO REPRODUCE THIS 

EDUCATIONAL RESOURCES information MATERIAL HAS BEEN GRANTED BY 

y CENTER (ERtC) ^ ^ . 

pJThis document has oeen faproduced as I'Wf m ^rT^ 

received from the person 0' organization K -f — <VW £0%^^. 



originating it 
□ Minor changes nave been made to improve 
reproduction quality 



• Points of view or opinions stateo <n tnis docu- 

O « ^^ nM A K„ T ITll'.^f f 2221 60 n . ot noces ^ nl * °"' c,al TO THE EDUCATIONAL RESOURCES 

Remarks prepared by Emerson J. Elliott oe* pom.*, or po., cy information center (erio 

for AERA session 1.11, Monday April 4, 1994 
<o "Research and Reform: Stories of Structure, Strategy and Suspense" 

CM 

<f 

CO 

to INTRODUCTION 
Q 

W My role as a member of this panel is to provide examples of "How research influences 
national policies related to assessment and program evaluation." 

Frequently we hear words of despair about the effect of the work of researchers on 
public policy decisions: 

o researchers don't understand our problems 

o researchers write technical jargon 

o researchers deal in theories rather than issues 

All true. 

Also true, however, is that the work of researchers is USED in decisionmaking, is 
frequently SOUGHT OUT. 

EXAMPLES OF RESEARCH THAT INFLUENCED NATIONAL POLICIF ? 

Let me mention some examples of panels and studies that have been comprised of, or 
have drawn on, researchers and their work in the areas of assessment and program 
evaluation. 

o The 1983 National Commission on Excellence in Education, which 

produced its report, A Nation at Risk, commissioned 41 papers, made 
extensive use of statistical data, and drew on in-depth analyses, to drive 
home its points. The report was a major factor in precipitating State 
legislated education reforms during the 1980s~reforms that failed to 
achieve the results policymakers intended and that set the stage for the 
more recent adoption of National Education Goals. 

o The 1989 National Assessment of Vocational Education contracted for 30 
inter-related research studies. It described the implementation of Federal 
vocational education laws and concluded that the statutory provisions were 
too weak to accomplish the Congressional goals. The Assessment provided 
information, available for the first time from NCES studies, that showed 
most high school students take at least one or two vocational education 
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courses, although the traditional concept of vocational "programs" was not 
commonly applied and not useful for understanding the effects of the law. 
As recommended by the Assessment, Congress dropped numerous 
categorical set asides in the Act, provided a floor on the average grant size, 
distinguished secondary from post-secondary activities, and encouraged a 
strong link between academic and work-skill content for vocational 
students. 

o The 1987 Alexander- James study group on the National Assessment of 
Educational Progress commissioned 46 Research papers, included an 
independent critique from members of the National Academy of 
Education, and convened nine subgroups involving 64 experts, mostly 
researchers, on topics ranging from cognitive skills assessments and reading 
assessment to design and structure of NAEP and costs. The Panel's 
recommendations provided the basis for legislative changes in 1988 that 
authorized a State component for NAEP and created the National 
Assessment Governing Board. 

o The National Education Goals Panel has created "resource groups," 

"technical planning groups" and has drawn on expert consultants for each of 
the Goals. The most recent listing included 146 individuals, a heavy 
proportion of them academic researchers, involved in these capacities. 
Another 91 individuals assisted the Panel in acquiring data for use in the 
annual report on U. S. progress toward the goals. As you know, the topics 
here include readiness of children for school, school completion, student 
achievement, adult literacy and college achievement, and safe schools-all 
now enacted into the GOALS 2000 legislation that President Clinton has 
just signed into law. 

o The Federal Government's primary grant of assistance for education of 
disadvantaged students, Chapter I, has seen extensively evaluated over its 
nearly three decade existence. Within the last year we have witnessed the 
completion of (a) the National Assessment of Chapter I, mandated by 
Congress four years ago, of (b) an Independent Review Panel for the 
National Assessment, also mandated by law, of (c) an Independent 
Commission on Chapter I (Chaired by David Hornbeck and funded by the 
MacArther Foundation) as well as (d) a RAND study entitled "Federal 
Policy Options for Improving tne Education of low-income students." 
These studies and panels-each drawing on members of the research 
community for advice or conduct of empirical work or analysis-share a 
number of common perspectives. They assert that the learning goals for 
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low-income or disadvantaged child* sn must be the same as those for all 
our children. They call for "school-wide" approaches to instruction, rather 
than pulling children out of class. They propose that professional 
development, instruction, and assessments be linked to curricular goals. 
And they call for assessments more closely related to the actual knowledge 
and skills that students are expected to master. 

Well, there could be many more such examples (see Note A) where the work of 
researchers related to assessments and program evaluations have influenced Federal 
policies, but I think these will serve to demonstrate that such work is both sought out 
and used in making national policy. 

A CASE STUDY-EVALUATION OF THE TRIAL STATE ASSESSMENT 

Let me make this more concrete through a case study. The example is an on-going 
evaluation by the National Academy of Education of the 'Trial State Assessment," a 
piece of the National Assessment of Educational Progress (NAEP). 

The setting for this case study begins with recommendations for improvements in the 
NAEP made by the Alexander-James study group in March of 1987, after the House of 
Representatives had already completed its work on the legislative authorization that 
would ordinarily have included NAEP. The House, however, had no proposed 
modifications for the NAEP legal authority that year. 

The Alexander-James recommendations, formulated into a legislative proposal by the 
Department of Education under then Secretary Bennett, were first taken up by the 
Senate and passed by that body with little review. Thus, the House was asked to accept 
them in the House-Senate Conference without any hearings of their own and without any 
corresponding House measure. 

The House conferees were skeptical-skeptical of NAEP, skeptical of tests, skeptical, 
especially, of tests for minority students and others in the Chapter I population, because 
tests did not, per se, improve education for disadvantaged students. The result was that 
the Senate authorization for a state component was made a "pilot" or "trial" program, 
and an "independent" evaluation was mandated. The text of the conference report (see 
Note B) and of the ensuing law (see Note C) are attached. By Washington standards, 
the conference report was especially detailed (never mind that it is repetitive.) It asked 
for: 

o evaluation of the meaning and reliability of differences in student 
performance observed across States; 

o (amazingly) exploration of "ceiling effects" in high performing States and 
the link of test content with State curricula; and 
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o analysis as to whether NAEP data presentations adequately provide a 

context for understanding factors that affect education achievement, such 
as per capita income, per pupil expenditures, ethnic and racial composition 
and level of urbanization. 

The law itself directed the Commissioner of Education Statistics to arrange with "a 
nationally recognized organization (such as the National Academy of Sciences or the 
National Academy of Education)" to "assess the feasibility and validity of assessments 
and the fairness and accuracy of the data they produce. . . describe the technical 
problems . . and . . . what was learned about how to best report data. . . ." 

It was clear that Congress was not about to authorize a permanent State assessment 
program without further consideration. 

NAE was commissioned by the National Center for Education Statistics in October, 
1989, to conduct the Trial Stare Assessment evaluation. The American Institutes for 
Research, a corporate research organization with long experience in testing, joined the 
Academy to provide a continuing home for the necessary staff activities. 

The Panel is a text-book case of a "blue ribbon" group: The co-chairs arc, Bob Glaser 
and Bob Linn; Executive Director, George Bohrnstedt, AIR. The initial group included 
Linda Darling-Hammond, Isabel Beck, Lloyd Bond, Ann Brown, Al Shanker, Gordon 
Ambach, Lyle Jones, David Cohen, Lorrie Shepard, Mike Smith, Ramon Cortines, 
among others-all well known names in AERA It functions by commissioning papers, 
designing empirical studies, some requiring field work as part of the NAEP contract, and 
extensive discussions among members as to interpretation of the evidence set before 
them. 

To date, 34 studies (see Note D) have been conducted by the Panel on such topics as: 

o Characteristics of the statistical design-sampling, eligibility and exclusion 
issues, and use of NAEP below the State level; 

o Content and curricular validity of the test; 

o Analysis and reporting-influence of choice of content, statistics and 
subpopulation breakdowns; validity of the NAEP achievement levels; 
comparisons of student performance on NAEP with other standardized 
tests; 

o State and local costs; and 

o Impact of the NAEP Trial State Assessment program. 
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There have been three major summary reports: (1) on the 1990 Trial, (2) on 
achievement levels, and (3) on the 1992 Trial and the Panel has made recommendations 
on a variety of topics (see Note E), for example: 

o That State NAEP be continued, but that each grade and each subject be 
evaluated; 

o That NAEP should be more inclusive in its coverage of 

students with disabilities, LEP students, students in private schools, and out 
of school youth; 

o That management of State NAEP should be modified in several respects, 
including a tighter requirement for school participation in the sample, 
permitting annual administration instead of biennial, permitting a reduced 
sample for small States; and 

o And, on the substance and reporting of NAEP, that there be a closer fit of 
the math test with NCTM standards, that the achievement levels be treated 
as developmental work and separated from the regular NAEP reports, and 
international benchmarks be established. 

The Academy Panel has briefed Congressional staff on results and sent notices to the 
press. I have sent reports to Congress and also to all State Departments of Education, 
as required by law, and, recently, to Governors as well. 

INFLUENCE ON NATIONAL POLICY 

Congressional action on NAEP's soon to expire legislation has moved only as far as 
House passage. But here are some observations at this point about the impact of the 
Panel's work: 

The House took action on many issues dealt with in the NAE reports: 

o It renewed and upgraded the authorization to continue 
the State assessments; 

o But the State assessments were kept as a "trial" for grades or subjects not 
already conducted and evaluated; 

o The House required that achievement levels be used only on a trial basis 
until they meet rigorous evaluation criteria established by the 
Commissioner; reports with achievement levels must be "separate and 
apart" from regular NCES NAEP reports; and 



ERIC 



6 



6 



o The House continued the evaluation requirements and specified that "the 
National Academy of Education or the National Academy of Sciences" 
must do them (no longer says "such as. . .") and these are to cover not only 
State assessments; by also national assessments, LEA assessments, and 
student performance levels. 

I do not mean to make unwarranted claims, here, so these need to be labeled as 
"interim" conclusions based on the record so far. The Senate has yet to act, and the 
concluding House-Senate conference has yet to be held. 

My conclusions at this interim stage are that the Academy study has been well received 
and that the intended Congressional audience has paid attention. There is other 
evidence-including sessions at this year's AERA conference-that additional audiences 
have paid attention as well. 

Why is this the case? I would posit several factors leading to these conclusions are that: 

o The Panel covered what Congress asked-not slavishly, but 

with their own expertise applied; 
o They did their homework-formulated studies, carried out 

empirical work, assessed results, applied judgment; 
o They took great pains to comport themselves as impartial 

judges and were largely successful in that; 
o They wrote their results and conclusions in clear English; 
o They observed Washington etiquette in submission so NCES sent the 

report officially to the Hill, as the law specified; and 
o They briefed Congress personally. 

A COMPARISON WITH AN EARLY MANDATED STUDY 

But was this evaluation study unique, a one-time chance, making an impact never before 
observed and unlikely to recur? Not at all, but some things have changed over a two 
decade v^riod v/hen Congressionally mandated studies have become increasingly 
frequent. In preparing for today's session, I recalled another Congressional mandate to 
evaluate a Federal program, now twenty years ago, and wondered whether any of my 
observations from this current case study paralleled those made by participants in the 
earlier one. The evaluation was known as the Compensatory Education Study and it was 
headed by Paul Hill, then in the National Institute of Education, from 1974 through 
1977. He has written about his experience in a 1980 Rand Corporation report entitled 
"Educational Evaluation in the Public Policy Setting." 

Paul described five problems faced by that study and what was done to solve them. Two 
of the five were (1) guaranteeing that the research was fair and (2) overcoming 
Congressional distrust of researchers. In the twenty years since authorization of the NIE 
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study there has been a considerable change in attitude. Congress and the Executive 
Branch may accuse researchers of irrelevance, sometimes, or of failm^ to connect their 
work with real problems. But they do call on researchers; they do mandate evaluations; 
they do ask researchers to testify; they do ask for briefings on the work; Congressional 
staff reaJily meet with researchers and with members of evaluation study panels. 
Perhaps this is a statement about the education research community as well~a 
community wanting to make their work count in important places because, finally, those 
places take actions that can affect American classrooms in powerful ways. 

A third problem faced by the Compensatory Education Study Paul Hill described as 
"resisting pressure from contending parties." In the Compensatory Education case, there 
were contesting positions among the sponsors of the legislation, so controversy was built 
into the statute. That was the case, too, with the Trial State Assessment, one dealt with, 
I believe, by a strong and continuous effort to build balance into the agenda, and 
through extensive deliberations as to interpretations the Panel would provide. The 
search for balance goes to great lengths, for example, in the issue of below State use of 
the NAEP tests-that is, by districts or schools. The Academy conclusion-to strip away 
the rhetoric-is don't do it, but "If Congress weighs and reads the evidence presented in 
this report and decides to lift the ban," then only do so at the district level and only with 
conditions (which are specified). 

The fourth problem identified by Paul Hill was moving from broad objectives in the 
language of the bill to specific researchable problems. In the Trial State Assessment 
evaluation, the law and the Conference Report included more details about what 
Congress wanted than in the earlier Compensatory Study. The Academy has had a free 
hand in formulating its research plan, negotiating with NCES primarily about the level of 
funding and access to NAEP field work as a source of data. Congress made their 
prinuuy impact on the study, probably, through their inclusion of language that the study 
be performed by a group such as the National Academy of Sciences or the National 
Academy of Education. 

And, finally, Paul Hill noted the problem of making results useful to Congress-specifjang 
that deadlines had to be met and that reports must be clear and understandable. 
Whether they read the record on the Compensatory Education Study experience or not, 
AIR and the Academy have consistently sought to follow up Paul Hill's advice. Perhaps 
it has helped to have some members who have much experience in communicating with 
policy makers. 

The summing up is this: Researchers and their work can and do have a strong influence 
on national policy- 

when they have evidence for their conclusions, 

when their work deals with problems policymakers must solve, and when they can 
state their conclusions clearly. 
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Studi«ir coaspletedf by the Evaluation Panel 
o/ the Mational Academy of Education 

1992 

Assessing Studmnt Achievement in the States 

George Bohrnstedt, Project Director 

A Critique of Sapling in the 1990 Trial State Assessment 

Bruce D. Spencer 

Eligibility/Exclusion Issues in the 1990 Trial State Assessment 

Bruce D. Spencer 

Evaluation of the Implementation of the 1990 Trial State 
Mathematics Assessment 

Donald H. McLaughlin, Francis B. Stancavage, Jay G. Chambers 
Elizabeth Hartka, Kadriye Ercikan 

The Content and Curricular Validity of the 1990 HAEP Mathematics 
Xtstms: A Retrospective Analysis 

F;dward A. Silver, Patricia Ann Kenney, Leslie Salmon-Cox 

The Relative Standing of States in the 1990 Trial State 
Assessment: The Influence of Choice of Content , Statistics, and 
Subpopulation Breakdowns 

Robert L. Linn, Lorrie Shepard, Elizabeth Hartka 

A Study of the Impact of Reporting the Results of the 1990 Trial 
State Assessment: First Report 

Frances B. Stancavage, Edward Roeber, George Bohrnstedt 

General Issues in Reporting the Results of the 2ZAEP Trial State 
Assessment 

Richard M. Jaeger 

The case for District- and School-Level Results from MASP 

Rcjnsey Selden 

Cautions on the Future of XAEPs Arguments Against Using HAEP 
Test and Data Reporting Below the State Level 

Walter Haney, George F. Madaus 

Reasonable Inferences for the Trial State HAEP Given the Current 
Designs Inferences That Can and Cannot Be Made 

Edward E. Haertel 



1993 

Setting Performance Standards for Student Achievement 

Lorrie Shepard, Principal Investigator 
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(Background Studies) 

An Evaluation of the 1992 MAEP Reading Achiavament X-evels, Report 
Ones A Cosmentary on tha Process 

David Pearson , Lizanne DeStefano 

Validity of tha 1992 HASP Achievement-Level-Setting Process 

Donald H. McLaughlin 

Order of Angof f Ratings in Multiple Simultaneous Standards 

Donald H. McLaughlin 

Rated Achievement X-evels of Cosipleted MAEP Mathematics Rooklets 

Donald H. McLaughlin 

An Evaluation of the 1992 HASP Achievement levels, Report Two: 
An Analysis of the Achievement-Level Descriptors 

David Pearson, Lizanne DeStefano 

Expert Panel Review of the 1992 HASP Mathematics Achievement 

Levels 

Edward A. Silver, Patricia Ann Kenny 

Comparison of Teachers' and Researchers* Ratings of Students' 
Performance in Mathematics and Reading with MAEP Measurement of 
Achievement Levels 

Donald H. McLaughlin (and 13 other authors) 

Comparisons of Student Performance on MAEP and Other Standardised 
Tests 

Elizabeth Hartka 

Comparing tha HAEP Trial State Assessment Results with the IAEP 
International Results 

Albert E . Beaton, Eugenio J. Gonzalez 

An Evaluation of the 1992 MAEP Reading Achievement Levels, Report 
Three: Comparison of Outpoints for the 1992 MAEP Reading 
Achievement Levels with Those Set by Alternative Means 

David Pearson, Lizanne DeStefano 



1993 

Thm Trial Statm Assessment* Prompmctm mnd Mmmlltimm 

George Bohrnstedt, Project Director 



1994 (Background Studies) 

A Study of Eligibility Exclusions and Sampling t 1992 Trial State 
Assessments 

Bruce D. Spencer 

A Study of Students Excluded from the 1992 National Assessment of 
Educational Progress Trial State Assessment 

Elizabeth Hartka 
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8L-te and Local Costs of the 1992 Trial State AsMimnt 

Catherine O'Donnell, Jay Chambers, Dey Ehrlich 

The Content and Curricular Validity of the 1992 MAEP Reading 
Framework 

Bertram C. Bruce, Jean Osbom, Michelle Connneyras 

■valuation of the 1992 Reading Pramevork for tha Rational 
Assessment of Educational Prograaa 

Julia H. Mitchell 

Tha Contant and Curricular Validity of tha 1992 MAEP TSA in 
Mathematics 

Edward A. Silver, Patricia Ann Kenney 

Content Validation of the 1992 RAEP in Reading: Claaaifying 
Items According to the Reading Pramevork 

David Pearson, Lizanne DeStefano 

Impact of the 1992 MAEP Trial State Assessment Program! A 

Pollowup Study 

Francis B. Stancavage, Edward D. Roeber, George W. Bohrnstedt 

Issues in the Development of Spanish-Language Versions of the 
National Assessment of educational Prograaa 

Walter G. Secada 

The Judged Congruence Between Various State Assessment Tests in 
Mathematics and the 1990 National Assessment of educational 
Progress Item Pool for Grade-8 Mathematics 

Lloyd Bond, Richard M. Jaeger, assisted by Sarah E. Putnam 

A Study of the Administration of the 1992 Rational Assessment of 
Educational Progress Trial State Assessment 

Elizabeth Hartka, Donald H. McLaughlin 



Studies Proposed by the National Academy 
of Education for 1994-95 

NOTE: The list of titles and authors for these proposed studies 
is tentative. 

1) Study of ISP and LKP exclusions 

George Bohrnstedt, Frances Stancavage 

2) Alternative Assessments for ZEP and LIP students 

George Bohrnstedt, Frances Stancavage 
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3) Study of the 1994 exclusions and Sajopling Frame 

Donald H. McLaughlin 

4) Study of the 1994 TSA Administration 

Elizabeth Hartka 

5) Impact of Public School Monparticipation 

Donald H . McLaughlin 

6) Combining Stata and National KAEP 

Edward Haertel 

7) Linking Stats and national Assessments 

Richard Jaeger 

8) Anchoring Achievement Levels 

Donald H . McLaughlin 

9 ) Content Validity of the 1994 Reading Assessment 

David Pearson 

10) Impact and Reporting of HAEP Results 

Edward Roeber, Frances Stancavage 

11) Study of HAEP Scaling: Trends , Content , Mode of Assessment 

Robert Linn, Donald McLaughlin 

12) Acquisition of Competence and its Relevance for HAEP 
Assessments 

Robert Glaser, Donald McLaughlin 

13) The Capstone Report 

Robert Glaser, Robert Linn, George Bohrnstedt 
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Note E 



Topics of NAE Recommendations 

Continuation of State NAEP _ ades and 

Continue State NAEP but conduct evaluations of grades ana 

subjects not covered and merge State evaluation with 
national 

Coverage of State NAEP 

students in private schools 

IEP (study exclusions) . 

Out of school 17 yr. olds OR cohort in NALS 

Trial in Spanish for LEP students 

Not below State level OR very limited and not to schools 

Management of State NAEP 

continue to monitor sites for uniformity 

Tighter requirement for school participation rates 

Merge State and national samples for efficiency 

Permit annual administration to distribute workload 

Use half-sample size for small States 

Do not increase State cost sharing 

Provide adequate funding for a quality program 

Substance and reporting of the Assessment 

Text coverage and item types more consistent with NCTM 
Set achievement levels in coordination with NESIC 
Use focus groups to help determine useful displays 
Provide examples for press of proper ^. inter P"^^ S 
Discontinue use of Angoff methods f or achievement levels 
Discontinue reporting by achievement levels as_ used in 1992 
Ask for standard setting advice from more diverse sources 
(such as business leaders, standards committees, content 

Publish^lchievement levels separate from official reports 
Use percentile scores to monitor achievement km 
Use international comparisons to set benchmarks and provide 

for equating with TIMSS ,000= 
Work with NEGP to determine how to report over the 1990s 
implement within-grade score reporting 

Long term recommendations on Performance Standards . „ 

Develop content standards and performance standards in an 

Continururiversighrgroup from frameworks through reporting 
Address issue of developmental model that underlies 

achievement levels and scales ► 

Evaluate achievement levels before use for regular reporting 

Recognize need for multiyear developmental process 

Provide for stability of measures over an 8 to 10 year per 10a 
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